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Abstract 

This article is devoted to nonlinear approximation and estimation via piecewise 
polynomials built on partitions into dyadic rectangles. The approximation rate is 
studied over possibly inliomogeneous and anisotropic smoothness classes that contain 
Besov classes. Highlighting the interest of such a result in statistics, adaptation in the 
minimax sense to both inhomogeneity and anisotropy of a related multivariate density 
estimator is proved. Besides, that estimation procedure can be implemented with a 
computational complexity simply linear in the sample size. 
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1 Introduction 

When estimating a multivariate function, it seems natural to consider that its smoothness is 
likely to vary either spatially, or with the direction, or both. We will refer to the first feature 
as (spatial) inhomogeneity. If the risk is measured in a Lg-norm, measuring the smoothness 
in a Lp-norm with p < q allows to take into account such an inhomogeneity - all the greater 
as p is smaller - in the sense that functions with some localized singularities and otherwise 
flat parts may thus keep a high smoothness index. For the second feature, we will talk about 
anisotropy, which is usually described by different indices of smoothness according to the 
coordinate directions. Yet, statistical procedures that adapt both to possible inhomogeneity 
and anisotropy remain rather scarce. Indeed, the existing literature seems to amount to 
the following references. Neumann and Von Sachs |NvS97) . for estimating the evolutionary 
spectrum of a locally stationary time series, and Neumann |NeuOO| . in the Gaussian white 
noise framework, study thresholding procedures in a tensor product wavelet basis. In a 
Gaussian regression framework, Donoho |Don97) proposes the dyadic CART procedure, a 
selection procedure among histograms built on partitions into dyadic rectangles, extended 
to the density estimation framework by Klemela |Kle09j . Last, Kerkyacharian, Lepski 
and Picard [KLPOl] introduce a kernel estimator with adaptive bandwidth in the Gaussian 
white noise model. These authors study the performance of their procedures for the L2-risk, 
apart from the latter who consider any L^-risk for q > 1. Neumann and Von Sachs |NvS97] 
measure the smoothness of the function to estimate in the Sobolev scale, whereas the others 
consider the finer Besov scale. Besides, the Lp-norm in which the smoothness is measured is 
allowed to vary with the direction, except in |Don97| . but always constrained to be greater 
than 1. Common to those few procedures is the ability to reach the minimax rate over a 
wide range of possibly inhomogeneous and anisotropic classes, up to a logarithmic factor, 
the unknown smoothness being as usually limited by the a priori fixed smoothness of the 
underlying wavelets, piecewise polynomials or kernel. 

Adaptation results of the aforementioned type rely as much on Statistics as on Approx- 
imation Theory, oracle- type inequalities reflecting the interplay between both domains. 
Assume for instance that the function s to estimate lies in the set J^([0, 1]'^,M) of all real- 
valued functions defined over the unit cube [0, l]'^, let {Sm)meM be a given family of linear 
subspaces of J-{[0, and s a statistical procedure somehow based on that family. An 
oracle-type inequality in the Lg-norm roughly takes the form 




where C is some positive constant, indicating that s is able to choose a model Sm in the 
family that approximately realizes the best compromise between the approximation error 
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and the dimension of the model. Equivalently, it may be written as 



[I 



s - s\\l\ < C inf 



Dm* 




'm 



s-m + 




(2) 



where Ai^ = {m G Ai s.t. dim(S'm) = D}. On the other hand, the collection {Sm)m£M 
should be chosen so as to have good approximation properties over various classes S{a,p, R) 
of functions with smoothness a measured in a Lp-norm and with semi-norm smaller than 
R. Otherwise said, each approximating space LimeMn'^rn ^ typically nonlinear to deal with 
inhomogeneous functions- should satisfy, for a wide range of values of a,p and R, 



for some positive real C{a,p) that only depends on a and p. Combining the oracle-type 
inequality ([2]) and the approximation result ([3]) then provides an estimator s with rate at 
most of order (Rn-°'/<i)id/(d+-2o^) 

over each class S{a,p, R), which is usually the minimax 
rate. Having at one's disposal spaces {Sm)meM that do no depend on any a priori knowl- 
edge about the smoothness of the function to estimate - other than the scale of spaces it 
belongs to - and reaching the approximation rate jS]) is thus a real issue for statisticians. In 
order to deal with inhomogeneity only, in a multivariate framework, such results appear for 
instance in the following references. DeVore, Jawerth and Popov [DJP92J, Birge and Mas- 
sart |BMOO| or Cohen, Dahmen, Daubechies and DeVore |CDDDOT] propose wavelet based 
approximation algorithms aimed in particular at Besov type smoothness. Applications of 
the approximation result of [BMOO] to statistical estimation may be found in Birge and 
Massart [BM97J or Massart |Mas07j for instance. DeVore and Yu |DeV98j are concerned 
with piecewise polynomials built on partitions into dyadic cubes, notably for functions with 
Besov type smoothness. But their result will wait until Birge |Bir06| to be used in Statistics. 
More generally, such results are in fact hidden behind all adaptive procedures. Thus, for 
both inhomogeneous and anisotropic functions, we refer in particular to the articles cited 
in the first paragraph. Let us underline that the procedure studied by Donoho |Don97j and 
Klemela |Kle09) . though based on dyadic rectangles instead of cubes, does not rely on a 
nonlinear approximation result via piecewise polynomials such as [ DY90| . Indeed, the adap- 
tivity of that estimator follows from its characterization as a wavelet selection procedure 
among some tree-structured subfamily of the Haar basis. Other nonlinear wavelet based 
approximation results are proved in Hochmuth |Hoc02b) or |Lei03) for anisotropic Besov 
spaces. Last, piecewise constant approximation based on dyadic rectangles is studied in 
Cohen and Mirebeau |CM09] for nonstandard smoothness spaces under the constraint of 
continuous differentiability. 

Our aim here is to provide an approximation result tailored for statisticians, whose in- 
terest is illustrated by a new statistical procedure. The first part of the article is devoted to 
piecewise polynomial approximation based on partitions into dyadic rectangles. Thanks to 



sup inf 



s-t\\g < C{a,p)RD-'^''^ 



(3) 
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an approximation algorithm inspired from DeVore and Yu |DY90] . we obtain approxima- 
tion rates akin to ([3]) over possibly inliomogeneous and anisotropic smoothness classes that 
contain for instance the more traditional Besov classes. The approximation rate can be 
measured in any Lg-norm, for 1 < q < oo, and we allow an arbitrarily high inhomogeneity 
in the sense that we measure the smoothness in a Lp-norm with p allowed to be arbitrarily 
close to 0. Besides, we take into account a possible restriction on the minimal size of the 
dyadic rectangles, which may arise in statistical applications. For estimating a multivari- 
ate function, we then introduce a selection procedure that chooses from the data the best 
partition into dyadic rectangles and the best piecewise polynomial built on that partition 
thanks to a penalized least-squares type criterion. The degree of the polynomial may vary 
from one rectangle to another, and also according to the coordinate directions, so as to 
provide a good adaptation both to inhomogeneity and anisotropy. Thus, our procedure 
extends the dyadic histogram selection procedures of Donoho |Don97j . Klemela [Kle09| or 
Blanchard, Schafer, Rozenholc and Miiller |BSRM07| , and the dyadic piecewise polyno- 
mial estimation procedure proposed in a univariate or isotropic framework by Willett and 
Nowak |WN07| . We study the theoretical performance of the procedure - with no need to 
resort to the "wavelet trick" used in |Don97| IKle09| - for the L2-risk in the density estima- 
tion framework, as |Kle09) . but we propose a more refined form of penalty than |Kle09| . For 
such a penalty, we provide an oracle-type inequality and adaptivity results in the minimax 
sense over a wide range of possibly inhomogeneous and anisotropic smoothness classes that 
contain Besov type classes. We emphasize that, if the maximal degree of the polynomials 
does not depend on the sample size, we reach the minimax rate up to a constant factor 
only, contrary to all the previously mentioned estimators. This results not only from the 
good approximation properties of dyadic piecewise polynomials, but also from the moder- 
ate number of dyadic partitions of the same size. We can also allow the maximal degree 
of the polynomials to grow logarithmically with the sample size, in which case we reach 
the minimax rate on a growing range of smoothness classes, up to a logarithmic factor. 
Moreover, our procedure can be implemented with a computational complexity only linear 
in the sample size, possibly up to a logarithmic factor, depending on the way we choose the 
maximal degree. 



The plan of the paper is as follows. Section [2] is devoted to piecewise polynomial 
approximation based on partitions into dyadic rectangles. In Section [3l we are concerned 
with density estimation based on a data-driven choice of a best dyadic piecewise polynomial. 
We study there the theoretical properties of the procedure and briefly describe the algorithm 
to implement it. Most proofs of Sections [2] and [3] are deferred respectively to Section H] and 
to Sections [H and [H 
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2 Adaptive approximation by dyadic piecewise polynomials 



In this section, we present an approximation algorithm by piecewise polynomials built on 
partitions into dyadic rectangles. We study its rate of approximation over some classes of 
functions that may present at the same time anisotropic and inliomogeneous smoothness. 

2.1 Notation 

Throughout the article, we fix d G N*, and throughout this section, we fix some d-uple 
of nonnegative integers r = (ri, . . . ,rrf) that represent the maximal degree of polynomial 
approximation in each direction. We call dyadic rectangle of [0, l]'^ any set of the form 
Ii X . . . X 1^ where, for all 1 < I < d, 

II = [0, 2-^'] or Ii =]k2~^',iki + 1)2~^^] 

with ji G N and fc; G {1, . . . , 2-'' — 1}. Otherwise said, a dyadic rectangle of [0, l]'^ is defined 
as a product of d dyadic intervals of [0, 1] that may have different lengths. For a partition 
m of [0, 1]*^ into dyadic rectangles, we denote by |r?T,| the number of rectangles in m and 
by S(^m,r) the space of all piecewise polynomial functions on [0, l]'^ which are polynomial 
with degree < r; in the l-th direction, I = 1, . . . ,d, over each rectangle of m. Besides, for 
< p < oo, we denote by Lp([0, 1]°') the set of all real-valued and measurable functions s 
on [0, 1]*^ such that the (quasi-)norm 

||s|| = J (/[o,i]dl^(^)N^<i(^))^^^ ifO<p<oo 
(sup^g[o,i]<* ifp = oo 

is finite, where is the Lebesgue measure on [0,1]'^. Last, C{9), Ci{9) or C[{9), i G N* 
stand for a positive reals that only depend on the parameter 6. Their values may change 
from one line to another, unless otherwise said. 

2.2 Approximation algorithm 

Let us fix 1 < g < oo. In order to approximate a possibly anisotropic and inhomogeneous 
function s in the L^-norm, we propose an approximation algorithm inspired from [DY90| . 
We shall construct an adequate piecewise polynomial approximation on a partition into 
dyadic rectangles adapted to s, beginning with the trivial partition of the unit square 
[0, 1]*^ and proceeding to successive refinements. For doing so, we consider the criterion 

£r{s,K\= \ni \\{s-P)1k\U (4) 

measuring the error in approximating s on a rectangle K C [0, 1]*^ by some element from 
the set of all polynomials on [0, l]'^ with degree < ri in the l-th. direction. We also 
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fix some threshold e > - to be chosen later, according to the smoothness assumptions 
on s. But contrary to |DY90| . we allow the degrees of smoothness of s to vary with the 
directions and describe them by a multi-index <t = (fii, . . . , ad) € nf=i(0' ~^ 1)' ™- ^ sense 
that will be made precise in the next subsection. Thus, our algorithm is based on a special 
subcollection of dyadic rectangles adapted to an anisotropic smoothness measured by a. 
Indeed, for j G N, we define VJ as the set of all dyadic rectangles /i x . . . x C [0, 1]*^ 
such that, for all 1 < / < d, 



0,2 



or 



II 



ka-^-^^/^'^^ih + 1)2- Li^/'^' 



with a = mini<i<rf(T/ and A;/ G {1, . . . , 2^^'^/'^'^ - 1}, and we set V"' = UjeN^^f • It should 



be noticed that, for all j G N, any K G Vj can be partitioned into dyadic rectangles of 

2(Ti for instance, a partition of [0, 1]^ 



^j+i, that we call children of K . For d 



2 and CJ2 

into dyadic rectangles from T>"^ will thus be roughly twice as fine in the first direction, as 
illustrated by Figure [TJ 



Figure 1: Example of partition of [0, 1]^ into dyadic rectangles from T)"^ for CJ2 = 2ai. 



The algorithm begins with the set X^{s, e) that only contains [0, 1]^. If fr (s, [0, l]°')q < e, 
then the algorithm stops. Else, [0, 1]*^ is replaced with his children in Z^(s,e), hence a 
new partition X^(s,e). In the same way, the k-th step begins with a partition I^{s,e) 
of [0, l]'^ into dyadic rectangles that belong to D"'. If max^gjfc^g £"^(5, i^')^ < e, then 
the algorithm stops. Else, a dyadic rectangle K G I^{s,e) such that £r{s,K)q > e is 
chosen and replaced with his children in l'^(s,e), hence a new partition I^^^{s,e). Since 
s G ILg([0, 1]*^), £r{s,K)g tends to when the Lebesgue measure of K tends to 0, so the 
algorithm finally stops. The final partition e) only contains dyadic rectangles that 
belong to V"' and such that Toaaxx£X{s,e) £ris, K)q < e. For all K G e), we approximate 
s on by Qxis), a polynomial function with degree < ri in the l-th direction such that 
IK'S — QKis))'S-K\\q = £r{s-,K)q. Otherwise said, we approximate s on the unit cube by 



A{s,e)= Qk{s), 
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thus committing the error 

\\s-A{s,e)\U=( \Ks-QK{s))lKr}\ <ms,e)\'/^e (5) 



if 1 < (7 < 00, and 



if g = 00. 



\s - A{s,e)\\oo = max \\{s - Qk{s))1k\\oo < e (6) 



2.3 Approximation rate over anisotropic function classes 

In order to study the approximation rate of the previous algorithm, we introduce function 
spaces that arise naturaUy from the way the algorithm proceeds. Let us fix <t € nf=i(0' H + 
1) and < p,p' < 00. For s G Lp([0, l]"*) and G N, we set 

er,<T,p,fc(s) = inf (7) 

where H^'*^ is the set of all piecewise polynomial functions on [0, 1]'^ that are polynomial with 
degree < ri in the l-th direction over each rectangle in . Then, we define A/J/''^(Lp([0, 1]°')) 
as the set of all functions s G ILp([0, l]*^) such that the quantity 



supfcgf^ (2''-e^,cr,p,fc(s)) if p' = 00 



is finite. One can easily verify that Nr^cr,p,p' is a (quasi-)semi-norm on A/J/''^(Lp([0, 1]*^)), 
and that A/J/''^(Lp([0, l]*^)) gets larger as p' increases since 

Nr,a,p,p', (s) < N,^a,p,p[ (s) for < p'l < < 00. (8) 

li p > q, then A/^/'^(Lp([0, 1]*^)) is obviously embedded in the space Lg([0, l]'^) in which we 
measure the quality of approximation. The same property still holds for p smaller than q, 
under adequate assumptions on the harmonic mean H{a) of cJi, . . . ,0"^, i-e. 



Indeed, denoting by = maxjj;, 0} for any real x, we prove in Section U the following 
continuous embedding. 
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Proposition 1 Let a € nf=i(Oi ''"i + < p,p' < oo and I < q < oo. If 

H{a)/d>{l/p~l/q)^, 
then A/';,''^(Lp([0, 1]-^)) C Lg([0, l]'^) and, for all s G Ar/'^(Lp([0, 1]'^)), 
\\s\\q < C{d,r,(T,p,p',q) {\\s\\p + Nr^cr,p,p'(.s)) . 

The reader familiar with classical function spaces will have noted the similarity between 
the definition and the embedding properties of spaces Ar,''^(Lp([0, l]"^)) and those of Besov 
spaces. Before going further, let us recall the definition of the latter according to |ST87| . 
for instance. We denote by (bi, . . . , b^) the canonical basis of and set IZ = [0, 1]*^. For 
ah (T = (di, . . . G (0, +00)'^, < p,p' < 00, s e Lp([0, 1]'^), h > and 1 < ^ < d, we 
define 

n{ai,h) = {x£ [0,lf s.t. x,x + hhi,...,x + {[ai\ + l)hhi G 7^}, 



k=0 



^{-l)l'''i+^-^s{x + khhi), for X G n{ai,h), 



ujl}^{s,y,n)p= sup ||A^' s%( /,)||p, for y > 0, 

0<h<y 



\^\cr,p,p' ^ 



" Ay 

y 



Ta=i (supy>oy '"uji}]{s,y,n)p 



i/p' 



if < p' < 00 
if p' = 00. 



For <T = (cTi, . . . , fJd) G (0, +00)'^, < < 00, we denote by (lL.p([0, 1]'^)) the space 
of all measurable functions s G Lp([0, l]'^) such that |s|cr,p,p' is finite. According to the 
proposition below, Besov spaces (Lp([0, 1]*^)) are embedded in spaces A/J/''^(Lp([0, 1]'^)). 



Proposition 2 Let a G ITzLiCOin + 1), < p < 00 and < p' < 00. For all s G 

^^,{Lp{[0,lf)), 

Nr,cr,p,p'{s) < C{d,r,a,p,p')\s\^^p^p>. 

We shall not give a proof of that proposition here, since it relies exactly on the same 
arguments as those used by |Hoc02aj in the proof of Theorem 4.1 (beginning of page 
197) combined with Inequality (14) in the same reference. It should be noticed that 
the space A/J,''^(Lp([0, l]'^)) is in general larger than ,^^(Lp([0, l]'^)). Indeed, contrary 
to ^^(Lp([0, l]'^)), the space A/J,''^(Lp([0, 1]'^)) contains discontinuous functions (piecewise 
polynomials, for instance) even for H[a)/d > 1/p. 

We are now able to state approximation rates over anisotropic classes of the form 

S{r,a,p,p',R) = {s GLp([0,l]'^) s.t. Nr,a,p,p'{s) < R}, 
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where rr G nf=i('-'' n + < < oo and i? > 0, thus extending the result of DeVore and 
Yu |DY90| (Corollary 3.3), which is only devoted to functions with isotropic smoothness. 
The approximation rate is related to the harmonic mean H{cr) of cJi, . . . , ad, which in case 
of isotropic smoothness of order a, i.e. if ai = . . . = = cr, reduces to a. 

Theorem 1 Let R > 0, a = (cJi, ...,ad) G nf=i(0) n + 1), < p < oo and 1 < q < oo 
such that 

H{a)/d>{l/p-l/q)+. 

Assume that s € S{r,cr,p,p' , R), where p' = oo if < p < 1 or p > q, and p' = p 
if 1 < p < q. Then, for all /c € N, there exists some partition m of [0, l]'^ into dyadic 
rectangles, that may depend on s,d,r,cr,p and q, such that 

\m\ < Ci{d,(T,p)2'"^ 

and 

inf \\s-t\\g<C2id,r,(T,p,q)R2-^"^''\ (9) 

The same result still holds whatever < < oo if < p < 1 or p > g, and whatever 
0<p'<j)ifl<p<(7, asa straightforward consequence of Theorem [1] and Inequal- 
ity ([8]). Denoting hy A4d, D G N*, the set of all the partitions of [0,1]*^ into D dyadic 
rectangles, we obtain uniform approximation rates simultaneously over a wide range of 
classes S{r,cr,p,p' , R) by considering the nonlinear approximating space DmeMD^{m,r)- 
That property is stated more precisely in Corollary [1] below, which can be immediately 
derived from Theorem [TJ 

Corollary 1 Let R > 0, a = {ai,...,ad) G llf=i{0,ri + I), < p < oo, < p' < oo 

and 1 < q < oo satisfying the assumptions of TheoremUl For all D > Ci{d,a,p), where 
Ci{d,cr,p) is given by TheoremUl 

sup inf \\s-t\\q<C'2{d,r,cT,p,q)RD-^'^''y'^. 

s€S{r,cr,p,p',R) ^^^■m,eMjjS{m,T) 

We also propose of a more refined version of Theorem [T] that allows to take into account 
constraints on the minimal dimensions of the dyadic rectangles, which will prove most useful 
for estimation purpose in the next section. We recall that = miuKKd^i- 

Theorem 2 Let J en, R > a = (iti, . . . , cjd) G WUii'^^n + f), 0<p<oo,0<p'<oo 
and 1 < q < oo such that 

H{a)/d>{l/p-l/q)^. 

Assume that s G S{r,cr,p,p',R), where p' = oo if < p < 1 or p > q, and p' = p if 
1 < p < q. Then, for all /c G N, there exists some partition m of [0, l]'^, that may depend 
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on s,d,r,cr,p and q, only contains dyadic rectangles with sidelength at least 2 •^—/'^i in the 
l-th direction, I = 1, . . . , d, and satisfies both 

\m\ < Ci{d,a,p)2'"^ 

and 

inf lis - t||, < Csid, r, a,p, q)R f2--^d(Hia)/d-(i/p^i/q)+)g:/H{a) ^ 2~'^^{'^)) . (10) 

Remark: Given J G N, that theorem rehes on applying the approximation algorithm of 
Section 12.21 to an approximation of s from ^(mj.r)) where mj is the partition of [0,1]*^ 
into the dyadic rectangles from DJ. Thus, the term 2-'^'^('^('^)/^~(i/P"i/«)+)^/-^('^) in (fTOD . 
which is of order (di'm{S(^r^j^r)))~^^^"'^^^^^^^'P^^^'^^+\ corresponds with an upper-bound for 
the linear approximation error inf^gs^^ ||s ~ Alg- The upper-bound pO|) is of the same 
order as ([9]) - up to a real that only depends on d,r,cr,p,q - as long as 

- H{,7) \ d \p qJ^J H{<T) 

li p > q and = H{cr), i.e. if s has homogeneous and isotropic smoothness, then that 
condition simply amounts to k < J. Otherwise, Condition (|lip is all the more stringent 
as p is small by comparison with q or as a is small by comparison with H{cr), i.e. all the 
more stringent as inhomogeneity or anisotropy are pronounced. 

Given J G N, let us denote by the set of all the partitions into D dyadic rectangles 
with sidelengths > 2~'^, for D G N*. We can still obtain uniform approximation rates 
simultaneously over a wide range of classes S{r, a,p,p' , R) under the constraint that the 
piecewise polynomial approximations are built over dyadic rectangles with sidelengths > 
2"*^, by introducing this time the nonlinear approximation space U^g_y\^j S'^^ Indeed, as 

for all cr = {ai, ...,ad)e nf=i(0' n + 1) and Z = 1, . . . , d, 2"-^^/'^' > 2'"^ , a straightforward 
consequence of Theorem [2] is Corollary [2] below. 

Corollary 2 Let J e N, R > 0, cr = (fii,. . . ,(7^) G nf=i(0'n + l), 0<p<oo, 0<p'<oo 
and 1 < g < oo satisfying the assumptions of Theorem\^ For all D > Ci{d,cr,p), where 
Ci{d,cr,p) is given by Theorem\^ 



sup inf ||s — 1\ 



< C^d, r, a,p, q)R ( 2-JdiH{.)/d-(i/p-i/gM./H(.) ^ ^^kH{.) 
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3 Application to density estimation 



This section aims at illustrating the interest of the previous approximation results in statis- 
tics. More precisely, placing ourselves in the density estimation framework, we show that 
combining estimation via dyadic piecewise polynomial selection and the aforementioned 
approximation results leads to a new density estimator which is able to adapt to the un- 
known smoothness of the function to estimate, even though it is both anisotropic and 
inhomogeneous. Besides, we explain how such a procedure can be implemented efficiently. 

3.1 Framework and notation 

Let n € N, n > 4, we observe independent and identically distributed random variables 
Yi, . . . ,Yn defined on the same measurable space (ft, A) and taking values in [0, l]'^. We 
assume that Yi, . . . , 1^ admit the same density s with respect to the Lebesgue measure 
on [0, l]'^ and that s £ L2([0, l]'^). We denote by Pg the joint distribution of (Yi, . . .,Yn), 
that is the probability measure with density 

rlP " 

1^ : (yi, . . . , y„) G [0, 1]'^ X . . . X [0, l]-^ ^ s{y.), 

d i=l 

while P<j stands for the underlying probability measure on ($7,^), so that for all product B 
of n rectangles of [0, l]'^ 

Ps{B) = G Vt s.t. (yi(^), . . . ,y„(6^)) G B}). 

The expectation and variance associated with are denoted by and Var^. 

3.2 Dyadic piecewise polynomial estimators 

Let m be some partition of [0, 1]^ into dyadic rectangles and p = {pK)Kem a sequence such 
that, for all K £ m, = • • • :PK{d)) G N'^. We denote by S^m,p) the space of all 

functions t : [0, l]'^ — )• M such that, for all X G m, t is polynomial with degree < Pk{1) in 
the l-th direction on the rectangle K. In particular, if p is constant and equal to r, then 
S(^jri,p) coincides with the space S(^m,r) introduced in Section [2j Let (., .) be the usual scalar 
product on L2([0, 1]'^). We recall that s minimizes over t G L2([0, l]'^) 

\\s-tg-\\sg = \\tg-2{t,s)=Es[-f{t)], 



where 

7{t) = \\tf2--Y.t{Yi) 



2 " 



n 
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only depends on the observed variables. Thus, a natural estimator of s with values in S(^m^p) 
is 

S(m,p) = argmin -f(t), 

that we will call a dyadic piecewise polynomial estimator. Such an estimator is just 
a projection estimator of s on S^^ py Indeed, if for each dyadic rectangle K we set 

^{Pk) — nf=i{0' • • • ' Pi^(0} ^'^d denote by {^K,k)k£k{pjf) an orthonormal basis of the 
space of polynomial functions over K with degree < pxil) in the Z-th direction, then simple 
computations lead to 



KemkeAipif) \ i=l / 



For theoretical reasons, we shall choose in the remaining of the article an orthonormal basis 
i^K,k)keA{pj^) derived from the Legendre polynomials in the following way. Let {Qj)jeN 
be the orthogonal family of the Legendre polynomials in L2([— 1, 1]). For K = Y\i=i[ui,Vi\ 
rectangle of [0, 1]'^, k = {k{l), . . . , k{d)) G N'' and a; = (xi, . . . , Xa) G [0, 1]^, we set 

d 

7r{k) = l[{2k{l) + l) 

1=1 

and 

We recall that, for all j G N, Qj satisfies 

2 

||Qj||oo = l and WQjWl = ^2j + i) - 

Therefore, for K rectangle in [0, 1]*^ and Px ^ {^K,k)keA{PK) ^ basis of the space of 
piecewise polynomial functions with support K and degree < pk{1) in the Z-th direction, 
which is orthonormal for the norm lUU and satisfies 



2 _ ^(fc) ^9^ 
\\^K,k\\^ - J^y (12) 

For each partition m of [0, l]'^ into dyadic rectangles and each p = {pK)Kem S (N'^)''"', 
we can evaluate the performance of S(^^^p-^ by giving an upper-bound for its quadratic risk. 
For that purpose, we introduce the orthogonal projection S(m,p) of ^ on p), the dimension 
dim(S'(^ p)) of S'( 

m,p)i 

d 

dim(5(„,p)) = ^ |A(p;,)| = ^ Y{{pk{1) + 1), 

Kem Keml=l 
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and define Pmax = (Pmax(l), • • • id)) by 

Pma.AI') = ^^^Pk{1),1 = I,- ■ ■ ,d. (13) 



Proposition 3 Let m be a partition of [0, 1]"^ into dyadic rectangles and p = {PK)Kem £ 
(N"')H. ifse L2([0,l]'^), then 

[\\S - S(^rn,p)\\l] = \\S - S(m,p)l!i + ~ Yl J2 Var, ($7^_fc (^i) ) . 

KGm fceA(p^-) 



-(Z PI loo is finite, then 

[||S - S(„^p)||2j < ||s - S(m,p)ll2 + 7r(pmax) || S ||oo 



21 ^ II l|2 , r Ml II '^™('S'(m,p)) 



n 



Proof: Pythagoras' Equality gives 

[\\s - S{m,p) Hi] = P - ■S(m,p) II2 + [||s(m,p) " ^(m,p) II2] • 



Then, we deduce the first equality in Proposition [3] from the expressions of S(^m,p) 
^{m,p) ill the orthonormal basis (*J'x,fc)_ft'em,feeA(p^) of ^{m,p) ^^id the fact that Yi, . . . ,1^ 
are independent and identically distributed. 

If s is bounded, we deduce from (|12p that, for all K ^ m and k € A(p^), 

[^lfe(>l)] < (^'I^))^ ^ ||s||ooVr(p„,ax), 

hence the upper-bound for [||,s — S(m,p)|l2] ■ ' 

Thus, we recover that, for bounded densities at least, choosing a model ^(m.p) that real- 
izes a good compromise between the approximation error and the dimension of the model 
leads to an estimator with small risk. Such a choice reveals in fact optimal for 

densities presenting the kind of smoothness described in Section 12.31 More precisely, for 
<T G (0, +00)"^, < p,p' < 00, R > and L > 0, we set [crj = ([dij, . . . , [(Jd\) and consider 
the class V{(T^p,p' ^R,L) of all the probability densities s with respect to such that 
s S ^([crj + l,cr,p,p' ,R) and ||s||oo < L. Thanks to the upper-bound of Proposition [3l 
we obtain in Proposition |4] below that any statistical procedure which is able to realize ap- 
proximately inf^g_/\4 pgjs^d Ks [\\s — S(m,p) II2] ' where A4 is the collection of all the partitions 
of [0, 1]"^ into dyadic rectangles, enjoys adaptivity properties: it also reaches approximately 
the minimax risk over a wide range of classes V{(J,p,p\R,L). 
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Proposition 4 For < p < oo, let p' = oo when < p < 1 or p > 2, and p' = p when 
I < p < 2. For all L > and R > n^^/'^ , if a e (0, +00)'^ and < p < 00 satisfy 
H{a)/d > {l/p- 1/2)+, then 

sup inf E^lls - S(„ p)||2] 

seV{(T,p,p',R,L) meM^peN-i 

/ .,,\2d/{d+2H{(T)) 

< Cid,cT,p,L) [Rn-^^"^/'^^ 



< C{d,(T,p,L)mf sup Eg — s|||] 

^ s&V{(T, p,p',R,L) 

where the last infimum is taken over all the estimators s of s. 

Proof: Let us fix cr,p,p',R,L satisfying tlie assumptions of Proposition |4] and choose 
p = [crj + 1. For all s € V{a',p,p' ,R,L), we deduce from Proposition [3] and Theorem [1] 
that 

inf E4||s-S(„,^)|12] <C(d,tT,p,L)inf i?22-2feH{,x)^_l 

We then choose /c+ as the greatest integer E N such that 2^'^/n < R^2^^^^^"'\ i.e. such 
that 2^^' < (uR'^y^^'^'^'^^^"'^^ so as to bound the infimum on the right-hand side, which 
provides the first inequality in Proposition 31 

Let us define the Besov class B{cr,p,p' , R, L) of all the probability densities s with re- 
spect to \d such that |s|o.,p,p' < R (where |.!o-,p,p' is defined in Section [2.3p and ||s||oo < L. 
We deduce from Proposition [2] that, for 0<p<lorp>2, there exists some pos- 
itive real C{cr,p) such that 'P{cr,p, 00, R,L) contains B{a, 00, 00, C{cr,p)R,L), and, for 
1 < p < 2, there exists some positive real C{cr,p) such that 'P{cr,p,p,R,L) contains 
B{a,p,p,C{a,p)R, L). Besides, according to Triebel [Trill] (Proposition 10), for all e > 0, 
the Kolmogorov e-entropy in L2([0, l]'^) of the Besov space =^^(Lg([0, 1]*^)) is e~^^"'^^'^ for 
H{a)/d > [l/q — 1/2) + . Thus, the second inequality in Proposition |4] follows from the 
lower-bounds for minimax risks proved in |YB99) (Proposition 1, ii)). ■ 

In the sequel, our problem will thus be to build a statistical procedure that requires no 
prior knowldege on s but whose risk behaves almost as inf^g_;V4 pgj^d Eg — S(m,p)ll2] • 



3.3 Dyadic piecewise polynomial selection 

Let us fix G N'^, G N, and denote by the set of all partitions of [0, l]'^ into dyadic 
rectangles with sidelengths at least 2~^* . We consider the family A4^*^^ of all couples 
{m, p) with m € and p = {PK)Kem such that, for all K & m, pj^ G A(r^). Ideally, 
we would like to choose the couple {m, p) that minimizes E^ [H* — S(„ p) |[|] among the 
elements of Al^*^^. This is hopeless without knowing s, but from Pythagora's Equality and 
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Proposition (j3.2p . we have, for all {m, p) G A^^*^^, 

[\\s - S(m,p)|[2] - llslli = -\\s(m,p)\\l + ^ Var^($/^,fc(^i))- 

Thus, we propose to select an adequate partition m and the associated sequence of maximal 
degrees p = {pK)Kerh from the data so that 

{m,p)= argmin {-p(„_p) + pen(m, p)} 
= argmin {7('S(m,p)) + pen(m, p)} 

where pen : Ad'^^^ — )• is a so-called penalty function. We then estimate the density s 

by 

According to the proof of Proposition |4l in view of proving the adaptivity of the penalized 
estimator s, the penalty pen should be chosen so that s satisfies an inequality akin to 

EMs-§g]<C min | P - 111 + "^'"^^^"'^^H (14) 

where C is a positive real that does not depend on n. 

In order to define an adequate form of penalty, we introduce the set of all dyadic 
rectangles of [0, 1]"^ with sidelengths > 2""^* and, for all K ^V^, and k € A(r^), we set 

n i—l 
V ' i=2 j=l 

which is an unbiased estimator of Ya,rs{^K,kiYi))- We also set 



fcGA(r*) V ^ 

that overestimate respectively 



Mi,* = -max V J ^) ' V^jr,fc(>») and Ms,* = - max max J^^^KkiYt), 



i=l 



n KeV^, fcGA(r*) ^ 

2=1 



max Pfmpllloo and max max T'&l' i^(yi)l . 
The following theorem suggests a form of penalty yielding an inequality close to ([HI). 
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Theorem 3 Let € N'^ and J^, £ N be such that |A(r*)j < max{exp(n)/n, n'^} and 
2dJ* < n/ log(n[A(r*)|). Let {L(m,p))(^^ p^^ji^^d.<^9 a family of nonnegative real numbers, 
that may depend on n, satisfying 

^ exp(-L(^^p)|m|) < 1. (15) 
// s is bounded and pen is defined on A^^*^^ by 



pen(m, p) = i ^ ^ (^^lO-^,/, + K2vr(fc)) 



n 

where ki , . . . , K5 are /ar^re enough positive constants, then 

[\\s - s\\l] < mill < - S(^^p)||2 + 4- X] X] Vars($A',fcC5^i)) 
/ ^„ ^dim(S'(^ )) i(m,p)l"i| 



II ' n I 

+ 4\\4lo^ir*)\Mr^)\^■ 
where positive reals, k'^, . . . ,k'^ only depend on ki, . . . , K5, and k'^ also depends 
on d. 

Thus, the penalty associated to each (m, p) G A^^*^^ is composed of two terms: an additive 
term that overestimates the variance over the model 5'(m,p)) ^"^^ ^ term linear in the size 
of the partition m, up to the weight L^^^^^p^ , that overestimates the upper-bound given in 
Proposition [3] for the variance over S^jn,p)- There remains to choose those weights under the 
constraint (|15p . According to Proposition [5] below, each model in Ad'^^^ can be assigned 
the same weight that only depends on d and r^. 

Proposition 5 //ki,...,K5 are large enough positive constants, then the penalty defined 
on Mfff by 



pen(m, p) = i ^ ^ [f^ic^^^k + i^2T^{k)) 



n 

Kern fceA(pjf ) 



+ ( ( «:3M2,. + ^Mr.) ) |A(r-.)| + ) ^^s{M\A{r,mm\ ^^^^ 



n 
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satisfies the assumptions of Theorem\^ Moreover, i/|A(r^)| < max{exp(n)/n, n"'}, 2°'"^* < 
n/ log(n|A(r^)|) and s is bounded, then for such a penalty 

E4||»-.li]<." min_ |||»-»(„..„||l + .(r.)|A(r.)IINlL '°'^"'^'"''''''""'"' |. 

where k" is a positive real that only depends on ki, . . . , K5 and d. 

Proof: First, for all D £ N*, the number of partitions of [0, l]'^ into D dyadic rectangles 
satisfies 

\Md\ < (4(i)^. (17) 

Indeed, as illustrated by Figure [21 each partition in Aiu can be described by a complete 
dyadic tree with D leaves whose edges are labeled with a sequence of D — 1 integers in 
{1, . . . , d} giving the cutting directions to obtain the partition from the unit square. 



[0,1] X [0,1] 
[0,l]x\0^l] [0,i]x(|,l] 

(1) / \ (^^/^ \ 

(2)/ \ 
(1,1] X [0,1] (1,1] X (1,1] 



Figure 2: Top: Partition of [0, 1]^ into dyadic rectangles. Bottom: Binary tree labeled with 
the sequence of cutting directions (2, 1, 2, 2) corresponding with that partition. 

The number of complete dyadic trees with D leaves is given by the Catalan number 

D\ D-l 
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hence (|17p . We deduce from (jl7p that, for all positive real L, 



^ exp(-Llm|) < X] X] exp(-L|m|) 

< {MMr,)\)^ expi-LD) 

Dm* 

< l/(exp(L-log(4d|A(r^)|))-l) 

So, we can choose L > log(8d|A(r^)[) for Condition (jlSp to be fulfilled. 

Since ||s||cxd > 1, the upper-bound for Eg — sH^] is then a straightforward conse- 
quence of Theorem [3j ■ 

It is worth pointing out that penalty (|16p is more refined than the penalties proposed 
by |Kle09) or ^ADlOj for density estimation via dyadic histogram selection based on a least- 
squares type criterion. Indeed, when is null, penalty (|16p is not simply proportional to 
the dimension of the partition. 

With a penalty chosen as above, we recover an inequality close to (|14p . that allows 
to prove the adaptivity of s over a wide range of classes 'P{cr,p,p' , R, L) as defined in 
Section 13.21 For that purpose, we introduce 

a d + 2H(a) f Hia) fl 1 

q{d,a,p)- 



H{a) H{a) \ d \p 2 
and 

w{r^) = 7r(rJ|A(r^)[ log (8ed|A(r^)|) . 

Theorem 4 Let € N'^ and € N 6e such that |A(r.*.)| < max{exp(n)/n, n'^} and = 
max{J € N s.t. 2"^"^ < n/ log(n|A(rj,)|)}, and pen be the penalty given by Proposition\^ 
For all p > 0, let p' = oo if < p < 1 or p > 2, and p' = p if 1 < p < 2. For all L > 0, 
o- G nf=i(O)^'t(0 + 1), P > such that H{a)/d > {1/p - 1/2)+ and q{d,a,p) > 1, for all 
R such that w{r^)/n < R^ < (n/log(n|A(n)[))'?('^''^'P)-i, 

sup E, [lis - ~S\\1] < C7^(^^)2H{<T)/(d+21f(<x)) .j^f [|[^ _ ^~||2] ^ 

s&V{rT,p,p',R,L) « sS:V{cr,p,p',R,L) 

where C only depends on d,cr,p,L and the penalty constants ki, . . . and the above infi- 
mum is taken over all the estimators of s. 

Thus, if is chosen as a constant with respect to n, then s reaches the minimax risk, 
up to a constant factor, over a wide range of classes that contain functions with possibly 
anisotropic and inhomogeneous smoothness limited by the maximal degrees r^. Another 
strategy consists in allowing the maximal degrees to increase with the sample size n, 
while w{ri,) varies slowly with n. For instance, with r^[l) = log(n) for all I = l,...,(i, 
our estimator s still approximately reaches the minimax risk over a range of classes all 
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the wider as n increases. The price to pay is only a logarithmic factor, proportional to 
(log(log(n)) log^'^(n))^-'^('^)/('^+^^('^)) over classes with smoothness H{(t). Thus, such a re- 
sult may be seen as a nonasymptotic and multivariate counterpart of Theorem 1 in Willett 
and Nowak fWNOTj . 

Remark: Contrary to |NvS97| INeuOO| IKLPOll IKle09| , we have chosen here the smooth- 
ing parameter independently of the smoothness of s, hence the restriction on q{d,cr,p), 
that could disappear otherwise. Setting /i^- = H{cr)/a, the condition q{d,cr,p) > 1 is 
equivalent to H[a)/d > i/(cr,p), where 

In case of isotropic and homogeneous smoothness, i.e. when /io- = 1 and p > 2, q{d,cr,p) > 
1 is simply equivalent to H{cr)/d > 0. In case of isotropic and inhomogeneous smoothness, 
i.e. when /Uo- = 1 and p < 2, q{d,a,p) > 1 is equivalent to H{cr)/d > v{(T,p) where 
^i^^p) £ This is slightly stronger than /7(<T)/(i > 1/2, but still better 

than the restriction H[a)/d > 1/p which is often encountered in the literature. Otherwise, 
iy{cr,p) increases with /i^- and 1/p, i.e. with the anisotropy and the inhomogeneity. 



3.4 Implementing the dyadic piecewise polynomial selection procedure 

We end this article with a brief discussion about the implementation of our estimator s for 
the penalty defined in Proposition [5l Let us fix the penalty constants ki, . . . , and set, 
for all dyadic rectangle K G and all r € A(r^), 

2 ^9 

Tr{k) 



WiK,r)= _ + 

fceA(r-) \ \ i=l / " 

log(8d|A(r.)|) ff^^^^^^ ^ ^^^^^^^^ i^^^^^i ^ ^^^^ .^ 



n 
and 

Vk = argmin W{K, r). 

rGA(r*) 

Given the decomposition of s^^^p-^ in the basis K,k)K£m,k&k{pj^)i model {m, p) to 
select in Ai."^^^ is characterized by 

{m,p)= argmin ^l^(i^,p^), 



SO 

m 



argmin W{K, Vk) and, for all K ^ rh, pj^ = Vk- 
Thus, the steps leading to s are 
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1. Compute Mi^^ and -/Vf2,*. 

2. For all K £ V^, and all k € A(r^), compute o'j^ j^- 

3. For all G P*, compute fx and W{K,rK)- 

4. Determine the best partition m = argmin^g^^ ^K<^m ^{K, tr)- 

5. Set, for all K £ m, pj^ = tr- 

6. Compute s = Sf^rh.p)- 

Since m is the partition in that minimizes a given additive criterion, it can be deter- 
mined via the algorithm inspired from Donoho |Don97| and described in i BSR04] (beginning 
of Section 3), with a computational complexity at most of order Therefore, one 

easily verifies that the whole steps only require a computational complexity at most of order 
0(|A(r,,)||D^|). Since \T>^\ = (2"^*+^ - l)'^, if we choose as prescribed by Theorem O 
then determining s requires at most 0{n) computations when r^, is constant, and at most 
0(nlog'^(n)) when = log(n) for all I = l,...,d. Last, regarding the choice of the 

penalty constants ki, . . . , n^, they can be calibrated via simulations over a wide collection 
of test densities. Such a method has already proved to yield good results in practice, even 
though several constants have to be chosen, as shown for instance in |CR04| . 

4 Proofs of the approximation results 

For J G N and K G VJ , we recall that the children of K are all the dyadic rectangles of 
I^J^i that are included in K. We will also refer to K as the parent of its children and will 
often use the fact that the children of K form a partition of K into 

d 

Y^2^^^^^''-^"''^~^^-^"''^ <: 2'^2'^-/^^"'^ (18) 
1=1 

dyadic rectangles from T^J-^i- 

In all the proofs, the notation C{9) stands for a positive real that only depends on the 
parameter 6, and whose value is allowed to change from one occurrence to another. 

4.1 Proof of Proposition [1] 

For p > q, Proposition [1] follows from the continuous embedding of Lp([0, l]'^) in Lg([0, l]'^). 
For p < q, it corresponds with the second point in the more general result below. 
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Proposition 6 Let R > 0, a = {ai, . . . , Od) € nf=i(0' n + 1), 1 < 9 < oo and < p < q 
such that H{a)/d > 1/p — 1/q. For s S Lp([0, l]'^), € N, and any dyadic rectangle 
K G V^, we set 

er,<T,p,k{s,K) = inf j|(s - P)I;^ 

If Nr,a,p,oo{s) < R, then 
i) for all j G N and all K G VJ , 

£r{s, K\ < C{d, r, a,p, q) ^ 2-'=^(^(-)/'^+i/'?-i/^')^/^(-)2'=^e,,^,p,fc(s, K). (19) 

k>j 

a) s G Lg([0, 1]'^) and \\s\\q < C{d, r, a,p, q')(||s||p + R). 

Proof: Let us fix 1 < g < oo, < p < j G N and K G "DJ . For all k > j, we denote by 
Ck{K) the set of all rectangles from that are included in K. Thus, Cj(K) is reduced to 
{-fC}, Cj+i{K) is the set of all the children of K, etc. . . . For any rectangle I C [0, l]'^, we 
denote by Pi{s) a polynomial function on I with degree < r; in the l-th. direction such that 

\\{S-Pl{s))ll\\p = £r{s,l)p, 

where £r{s,I)p is defined as in (|4j). For all k > j, we set 

^k{s,K)= ^ Pjis)li 

and, in order to alleviate the notation, we simply write ek{s, K) instead of er^cr,p,k{s, K) in 
the whole proof. It should be noticed that efc(s, [0, 1]"^) = er,cr,p,k{s) as defined by ([7]), and 
that 

i/p 

ek{s,K) = \\{s-T.k{s,K))lK\\p=\ ^r{s,I\' 

\ieCk{K) 

Therefore, 

||(s-Sfc(s,K))Ii^||p< Yl =er,^,p,k{s)<2->'^R 

so that the sequence (Sfc(s, -R'))fc>j converges to sIk in ILp([0, l]*^). 

Let us prove that ii'))jt>j also converges to six in Lq([0, l]'^). We now fix A; > j. 

When 0<p<g<ooas assumed here, Markov Inequality for polynomials asserts that, for 
all rectangle / of [0, 1]*^, and all polynomial function P G !3^r^ 

\\PMU < C{d,r,p,q){Xd{I))^^/'-'/P^\\Pli\\p. (20) 
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We refer to Lemma 5.1 in |Hoc02aj for a proof (that still holds for q = oo). Let us first 
assume that < p < q < oo. We then deduce from (|20p that 

\\j:k+i{s,K)-^k{s,K)\\l 

< C{d,r,p,q) Yl iUl)r^'^'-'^''^\\{^k+i{s,K) - ^k{s,K))li\\l 

I&Ck+iiK) 

< C((i,r,p,9)2'?(^-+i)'^(i/P-i/^)^/^W \\{^k+i{s,K)-J:kis,K))li\\l. (21) 

ieCk+i{K) 

Let us also fix I G Ck+i{K). Then 

(Sfe+i(s,K) - ^k{s,K))li = {Pi{s) - Pj{s))li 

where / E Ck{K) is the parent of /. Let k(p) = 2^/^ if p < 1, and k{p) = 1 otherwise. From 
the (quasi-)triangle inequality satisfied by ||.||p, we then get 

\\{j:k+i{s,K) - Sfc(s, K))li\\p < k{p) {\\{s - Pi{s))li\\p + 11(5 - Pi{s))li\\p) 

< k{p) [£r{s,I)p + £r{s,i)p] , 



hence, by convexity of x i— )• x"^, 

\\{^k+i{s,K)-^k{s,K))li\\l < 2-^-1^^ (p) (^£^,{s,l)p + £^,{sj)p^ 
By grouping all the rectangles / G Ck+i{K) that have the same parent, we obtain 

The classical inequality between ip and £g-(quasi-)norms 



for < p < g < oo (22) 



then provides 



Y \m+i{s,K) - ^u{s,K))ti\\l < 2'i-^K'i{p) (e^_^,(s,K) + 2'^(i+^/^(-))e^(s,i^)) . 

/GCfc+i{K) 
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Since {ek{s, K))k € N is a decreasing sequence, by setting r = H(cr)/d + 1/q — 1/p and 
combining Inequality (f2T]) with the above inequality, we obtain 



\\^k+i{s,K)-^k{s,K)\\g < C{d,r,a,p,q) (2''^ek{s, K)j 2-^'^-^/^(-) 

< C{d,r,a,p, q)R2-''^^^/^^''\ 

We can prove in the same way that such an upper-bound still holds for q = oo. Since r > 0, 
for all < p < q < oo, (Sfe(s, K))k>j also converges in Lg([0, l]'^) to sIk- In particular, we 
have thus proved that s € Lq([0, 1]*^). 

From the definition of 8r{s,K)q and the triangle inequality, it follows that 

£r{s,K\ < \\{s - Pk{s))1k\U 

<Y,\\^k+i{s,K)-J:k{s,K% 

k>j 

< C(d,r,a,p,<7) j;2-'=^-^/^(-)2'=^efc(s,K). (23) 

k>j 

We have thus proved (jlOh . and the above inequality for K = [0, 1]*^ combined with Markov 
Inequality (j20p also provides 

Ils||g < \\P[0,l]4s)\\q + I|S - -P[0,l]rf(s)llg 

<C{d,r,a,p,q){\\P[0^,]4s)\\p + R) 

< C{d, r, a,p, q){\\s\\p + £r{s, [0, 1]'^)^ + R) 

< Cid,r,(T,p,q){\\s\\p + R). 



4.2 Proofs of Theorems [T] and [2] 

A first approximation result for the algorithm decsribed in Section [2] can be stated as 
follows. 

Proposition 7 Let k e N, R > 0, a = {ai,...,ad) € Ylf=i{0,ri + I), < p < oo, 

1 < g < oo and s G ILq([0, l]"*)- Assume that 

H{cT)/d>{l/p-l/q)+ 

and that 
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Then, there exists some partition m of [0, 1]*^ that only contains dyadic rectangles from T)^ 

\m\ < Ci{d,a,p)2'"^ 



and E 'S'(m,r) such that 



and 

\\S - S(^rn,r)\U < C2{d, (T,p, 

Besides, if for some J G N, s is polynomial with coordinate degree < r over each rectangle 
ofDj, then m only contains dyadic rectangles from U^^o^J- 

Proof: For A; = 0, we can just choose m as the trivial partition of [0, 1]"^ and S(m,r) the 
polynomial of best Lg-approximation over [0, l]'^ in Indeed, we then have 

I|S - S{m,r)llq = ^r(s, [0, lf)q < R, 

where the last inequality follows from (j24p . Let us now fix A; > 1, set 

r = H{a)/d - {1/p - l/q)+ and A = 2(i+(i+^f)^/^('^))'^/P, 

and choose 

If X(s,e) is trivial, then the upper-bound ([5]) provides 

\\s-A{s,e)\\q < e < \R2-^"^'^\ 

Let us now assume that X(s, e) is not trivial and fix j > 1 such that I{s, e) H DJ is not 
empty. If X € I{s, e) H VJ, then K is a child of a dyadic rectangle K € T^J-i such that 

e < Sr{s,k)q, 

hence 

By grouping all the rectangles K G X(s, e) PlPJ having the same parent in T^f-i, and taking 
into account Remark (|18p . we obtain 



|I(s,e) nVJle^ < 2°'(^+(^+P'^)-/'^('^))2~-''*P'^-/'^'^'^)i?P. 
Replacing e by its value, we deduce that 

|X(s, e) nVJ\< 2fc'i(i+P^)2-J*^^/^('^). (25) 

Besides, for all j > 1, 

\I{s,e)nVJ\ < \VJ\ < 2^''^^/^('^). 
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Let us denote by J the greatest integer j > 1 such that 

'2] d.a_/ H {(t) ^ i2kd{l+pT) i2~jdpTa_/ H{(t) 

i.e. such that 

Since a/H{cr) < 1, the last inequahty is satisfied hy k > 1 for instance, so that J is 
weh-defined. Besides, J is characterized by 



Therefore, 



where 



|X(s,e)|= j;|Z(s,e)nP7| 

i>i 
J 

3=1 j>J+l 

<Ciid,a,p)2'"' 

2dgL/H{(T) 



Ci{d,a,p) = ——— - + 



2dCT///((T) Y 1 2~'^P'''^/^i'^) ' 

Moreover, we deduce from ([5]) that, if 1 < g < oo, then 

\\s-A{s,e)\\g<\I{s,e)\'/'^e<Cl^'{d,a,p)R2-''''('^\ 

and we deduce from ([6]) that, if g = oo, then 

||s-yl(s,e)|loo < e < XR2'''^^''\ 

So Proposition [7] is satisfied for 



C2{d,o-,p,q) 



' Cl''^{d,a,p)\ ifl<g<oo 
A if g = oo, 



m = I{s,e) and s^^ ,,) = A{s,e). The last assertion in Proposition [7] is a straightforward 
consequence of the approximation algorithm. ■ 

The following lemma allows to link Assumption (|24p with the (quasi-)semi-norm iVr,(T,p,p'- 
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Lemma 1 Let R > 0, < p < oo, 1 < q < oo and cr = {ai, . . . , ad) € n«=i(Oi + 1) such 
that H{a) /d > {1/p — l/q)j^. Assume that s € S{r,a,p,p',R), where p' = ooifO<p<l 
or p > q and p' = p if 1 < p < q, then 

sup2^'^(^/^('^))(^('^)/'=^-(^/P-i/'')+) I ^ £-P(s,K)q\ <C{d,r,a,p,q)R. (26) 



Proof: U p > q, then the left-hand side of Inequahty (j26|) is upper-bounded by 
sup 2^^ V £Pis, K)p = sup 2^^er a p j (s) < R. 

Let us now assume that p < q and set r = H{cr)/d + 1/q — 1/p. From Inequahty (|19p in 
Proposition [U we deduce that 

1/p 

\Kevj 

<C{d,r,CT,p,q)2^'^^^/"^''U I ^2-^'^"^/^('^)2'=^efc(s,i^) I | . (27) 



If < p < 1, then the classical inequality between £p and £i-(quasi-)norms recalled in ([22 
leads to 

1/p 

\Kevj 

1/p 

<C{d,r,a,p,q)2^'^^^/"^'''^ [ ^ Y^2-^P'^^^/"^'^h''P^elis, K) 

^K€VJ k>j 

1/p 

<C{d,r,a,p,q)V^^^I^^''^ I ^ 2-^P'^^^/^('^)2*^P^e^(s, [0, l]*^) 



1/p 



< C{d, r, cT,p, q) sup (2^^efc(s, [0, l]*^)) 2^'^^^/^('^) ^ 2-^P'^^^/^('^) 
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hence Inequality (|26p . If 1 < p < oo, then there exists 1 < < oo such that l/p+l/p*' = 1, 
so we obtain by applying Holder inequality to (j27p that 



< C(d,r,cr,p,g)2^'^"2i/J^('T) 



p/p* 



i/p 



Kev^ \k>j 



1/p 



<C{d,r,a,p,q)\^2^^P^ ^ el{s,K) 

k>j KGVf 



1/p 



<Cid,r,CT,p,q) I ^2^P^el{s,[0,lY 



hence Inequality ([26]) . ■ 

Last, Lemma[2]provides an upper-bound for the linear approximation error of 5(r , cr,p,p' ,R) 
by IIj in the Lg-norm. 

Lemma 2 Let R > 0, < p < oo, 1 < q < oo, cr = {ai, ... , ad) £ nf=i(Oi H + 1) ■^^'^/i 
t/iai H{cr)/d > — l/g)+, anc/ = 2^/^ if < p < 1, and 1 otherwise. Assume that 
s € 5(r , cr,p,p' ,R) where p' = ooifO<p<l or p > q, and p'=pifl<p<q. Then, for 
all J G N, there exists a function sj £ nj'*^ such that sj S S{r, cr,p,p', 2k{p)R) and 

\\s - s.j\\q < C{d, r, cT,p, g)2-M^^('T)/rf-(i/p-i/g)+)o:///(^)^^ (28) 

Proof: For all K G DJ, we denote by Pxis) a polynomial function on K with degree 
< ri in the l-th direction such that 

\\{s-PK{s))ll\\p = £r{s,K)p, 

and we set 

sj= Yl Pk{s)1k. 

In order to alleviate the notation, we simply write efc(s) instead of Cr^cr.p.kis), and ek{s,K) 
instead of er^cr,p,kis, K), as in the proof of Proposition [6l 
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Since sj E Ily , ek{sj) = for k > J. If k < J, then the (quasi-)triangle inequahty, 
the definition of sj and the inclusion n^'*^ C Hj provide successively 

efc(sj) < K,{p) - sjWp + efc(s)) 

< k{p) (ej(s) + efc(s)) 

< 2K(p)efc(s). 

Therefore, Nr^cr,p,p'isj) < 2K{p)Nr^a-,p,p'{s), so that sj € S{r,cr,p,p' ,2k{p)R). 
U p > q, then 

lis - sjllg < lis - sjIIp = Yl ^ris, K)p = er,^,p,jis) < 

\Kevj J 

If p < g < oo, then we deduce from Inequality (I23p in the proof of Proposition [6] and from 
Inequality (j22p between £p and £q-(quasi-)norms that 



5] ||(s-P^(.))I,,||^ 



<C7(d,r,a,p,g) [ Yl (^2-'='^(^('^)/'^+V5-VpW^(-)2'^^e,(s,K)' 

We then obtain Inequality (j28p either thanks to the inequality between £i and ^^-(quasi- 
)norms in case < p < 1, or thanks to Holder Inequality otherwise. Last, if q = oo, then 
we still deduce from Inequality (j23p that 

||s - SjIIoo = ||(s - PK{s))tK\\oo 

< Cid,r,cr,p,q) m^^^ ( ^ 2-'='^W'^)/'^+i/«-i/P)^/^('^)2'=^efc(s, i^) 



, fc>J 



< r, a,p, q)2~-Jd{H{a)/d+l/q~l/p)g:/H{a)j^_ 

■ 

Theorem [1] is then a straightforward consequence of Proposition [7] and Lemma [TJ To 
prove Theorem [21 for each J € N, we just have to apply Proposition [7] and Lemma [1] to the 
function sj given by Lemma [2] and use the triangle inequality 

inf ||s — i||g < |[s — sj||g + inf ||sj — 
where m can be any partition of [0, 1]"^ into dyadic rectangles. 
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5 Proof of Theorem [3] 

In the following proof, we denote by (''^(m,p))(^ p^gjvt'*"^ ^ family of nonnegative reals and 
set S = X^^^p^g^deg exp(— p)). We fix {m, p) € A^^*^^ as well as some positive reals 
C,9i,..., 08 such that 26*1(1 + 6*2) < 1 and 6*8 < 1. 
From the definition of s = s^^^ p^ it follows that 

7(s) + pen(m, p) < 7(s(^^p)) + pen(m, p). (29) 

For all t,n G L2([0,l]'^), 

7(t) - 7(tt) = ||s - t\\l - \\s - u\\l - 2u{t - u), (30) 

where 

i=l 

Besides, for all {m',p') G M'^''^, setting 

X{m',p') = \\s(^jn',p') - S(m',p')h, 

we obtain by developing a-^d S(m',p') in the orthonormal basis (*J*_ft',fc)_ft-em',fceA{/9'^^) 

and using the linearity of i/ 

X^{m',p') = ^ ^ l^^($i^,fc) = - ^(m'.p')) • (31) 

i<:Gm' feeA(p'^) 

From Equalities (I30p . (I3ip . Pythagoras' Equality and the linearity of u, we deduce 

7(s) - 7(s(m,p)) = \\s - s\\l - \\s - s^rn,p) \\l + X^im, p) - 2x^{rh, p) - 2v (s(A^p) - S(m,p)) , 

which, combined with Inequality (j29p . leads to 



||s - s\\l < \\s - S(^rn,p) \\l + pen(m, p) - x^(m, p) 

+ '^X^{rh, p) + 2v - S(m,p)) - pen(rfi, p). (32) 

We shall now provide an upper-bound for the term 2v [si^^ p-^ — s^^^ p^ on an event with 
great probability. From Bernstein's Inequality, as stated for instance in |Mas07| (Section 
2.2.3), for ah bounded function t : [0, 1]*^ R and all x > 0, 



„ , < exp(-x). (33) 
3 n 
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Let us fix {m',p') E M.^^^ and apply Bernstein's Inequality to t = S(^rn',p') ~ ^{m,p)- Since 
k has support K, 



\s{m',p')\\oo = max 



{s,<^K,k)^K,k 



< M 



1,* 



where 



' 7r(fc) 

Kev^ ^ V \AK) 

keAir^,) V '^^ > 



Ml ^ = max > 



\{s,'^K,k)\, 



so 



\\S{m',p') - S(m,p)\\oo < 2Mi_^. 

Since ^(^ p) and S^^^/^p/^ are both subspaces of ^(^^ ,,^), 



< Mi^^,\\s(^m',p') - S(rn,p)\\2- 



,,p))^dAQ 



From (j33p . there exists a set n{m, p,m' , p' ,Q such that (r2(m, p, m', p', C)) > 1 — 
exp(— (tfj-^/ p/) + (^)) and over which 



/ ^. II ||2'^(m',p') + , W(^rn',p')+C 
^ \^{m',p') - ^im,p)) S \ ^^Wl,*l|S(m',p') " S(m,p)ll2 r -^^^1 



n 



n 



(34) 



We recall that, for all a,b >0 and > 0, 

2ab< Oa^ + e'^b^. 
Thus, on il(m, p, m', p', ^), we have 

/ \ 2 1 — 1^ ^(m' p') ~l~ C 

Besides, using the triangle inequality, (|34p . and Pythagoras' Equality, we obtain 

l|S(m',p') - •S(m,p) II2 ^ (l|s ~ S(m',p')ll2 + ll^ " S(m,p)l|2)^ 

< (1 + 02)P-S(m',p')ll2 + (1 + ^2^^) I|s-S(m,p)ll2 

< (1 + 02)P - S^m',p') II2 - (1 + ^2)X'("1', P') + (1 + e^^) lis - S(^,p) Hi. 

Therefore, the set ^(m,p)iC) = pi)^j\/i''-<'3^i''^^ P^"^' ^ P' ^0 is an event with probability 

Fs(f^Kp)(C)) >l-exp(-C)S (35) 



30 



over which 

(S(rn,p) - S(„,p)) < 20i(l + e2)\\s - sg + 2^1 (l + 6^^) \\s - 

- 20i(l + e2)x'(r^, P) + 2 (2/3 + ^r') Mi,. "'^'^'^^^^ . (36) 

n 

Let us now provide a concentration inequahty for x^(rfi,p). For that purpose, we first 
prove the following result. 

Proposition 8 Let {m' , p') G M'^^^,x > 0, 

= Yj Y1 (^kfe(^l)) = [\\s(m',p') - S[m,p)\\l] 

and 

^2,*= ^ max E,[$|^;,(yi)]. 
There exist an event fl^ that does not depend on {m' , p') and an event fi^^/ ^^■((x) such that 

P,(J7^)<2'^+V(^'log(n)), (37) 

and, on p/-)(x), 

x\m', p')ln^ < (1 + 03)(1 + ^4) 

+ 4 (1 + ^4-^) IA(r.)| ((4/3 + M2,. + (5/3) (l + 3^3-^) 7r(r.)) ^. 

Proof: Let us fix x > 0, and set, for all K £ and fc G ^{f*), 



(^K,k = Vars («>/c,fc(yi)) , eK,k = ^J^fy^^k + 2\Ar(fc), 

^-^=0 n {l'^(^i^,fc)| <ex,fe\/Ad(i^)}. 

From Bernstein's Inequality (see for instance |Mas07) . Section 2.2.3), for all K G V^, 
k G A(rj,) and x > 0, 



H...)l>,/24-4 + ^^2l<2»p(-.), 



SO 

' ^(^A',fe)| > eK,kVUK)) < 2exp(-3nArf(i^)) < 2 exp(-3n2-'^''*). 
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Besides, there are 2*^*+^ — 1 dyadic intervals of [0, 1] with length > 2^'^*, so < 2'^(^+-^*). 
And we assume that 2'^"^* < n/log(n|A(rj,)|), hence the upper-bound for F5(ri^). 
Let us also fix (m',p') G Ad*^^^ , set 



ei^,fcV^(fc) and V.pO(^) = \ Ln /o /n-U ' 
fceA(p^) V ^^V'i + '^s )^ 

choose ^i^rn',p') a Countable and dense subset of 7(m',p') = ^ 'S'(m',p')/ll^||2 = 1, \\t\\oo < ^(m',p')(^)} ' 
and define 

Z{m',p')= sup z^(i) = sup 
Since <I>_R-,fc has support if, for all t G S(^jn',p')) 



[t2(yi)] =E, 



2- 



< lA(r,)|M2,.|!t||i. 

So Talagrand's Inequality, as stated for instance in |Mas07| (Chapter 5, Inequality (5.50)), 
ensures that there exists an event ^(^m' ,p'){^) such that ¥s{^(^m' ,p')i^)) ^ 1 ~ exp(— x) and 
over which 



Z{m',p') < (l + 03)Es [Z{m',p')]+^2\Air.)\M2,^^ + ^2{l/3 + 9^')v^m',p')^- 

Since v is linear, we deduce from Cauchy-Scwharz Inequality and its equality case that 

Xim,p')= sup z/(t) = z/(t'^,^^,^) 

*£'S'(m',p')'ll*ll2 = l 



where 



Therefore, 



K£m' k&A{p'j^) ' 



E, [Z(m',p')] < E, [x(m',p')] < V^IFK^ = a/^( 



'(m',p')- 
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Moreover, on the set fi^^/ p/-)(x)nil^, either x{^\ p') ^ y 2(1/3 + ^3 ^)v(^j^i^pi-^x/n, in which 
case G T{m',p'), so that 

xim',p') = Z{m',p') 



< (1 + ^3) + ^/2|A(r.)|M2,.^ + ^2(1/3 + e^^)v^^, ^p,)^, 



or xi'i^'i p') < Y 2(1/3 + ^3 ^)v{jn' ,p')x/n, and the above inequahty is stih satisfied. Apply- 
ing Inequality with = 1, we get 

^^(m',p') < max, X] ('^^.fc + 57r(fc)) < |A(r-^)| (M2,* + 57r(r^)) . 
Consequently, on f7(^/ p/)(x), 



X(m',p')lc, < (1 + ^3)VV,P') 



+ + V (1/3 + ^3"')(^^V + 5^(r,)^ y2|A(r.)|^. 



Thus, applying twice Inequality (|34p . with 6 = 64^ and ^ = 1, we get the concentration 
inequality for xi'^^'iP') stated in Proposition [HI ■ 

From Proposition [HI we deduce that f^x(C) = ^[m' p')i=M.'^''3^{'m' ,p')i'^{m' ,p') +0 is an event 
with probability 

(n^iO) > 1 - exp(-C)S 

over which 

x\m,p)ln^ < (1 + ^3)(1 + ^4)%,p) 

+ 4 (1 + 9,') |A(r.)| ((4/3 + 6^^) M2,. + (5/3) (l + 36^^) 7r(n)) !fV^i^±i . (38) 

Our main task is then to estimate the unknown variance terms V(m',p')) -^1,*, -/\^2,jr- 
Lemma 1 in | |RBRTM10| remains valid with the same constants even though the l^'s take 
values in R"^ with d > 1. Let us set 7 = 3 + log |A(r,t)|/ log(n). Since |A(rj,)| < n'^, 
7 is bounded independently ofn ( 3 < 7 < 3 + d)). So, from the proof of Lemma 1 
in [ RBRTM10| . for all K ^ TD^, and k G A(rj,), there exists an event ^K,k such that 
IPs(J^Kfc) ^ C(6'5,(i)/(n3|A(r^)|) and over which 



Var, {^kAYi)) < (1 + ^5) ^K,k + ,k lloo 



^011^... II Jn^,;,2 l^gH , . I|2 W 



, .1 00 

n n 



< (1 + ^5) [^K,k + 2j8alMk) + 32^(fc) 
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Applying Inequality with a = aK,k, b = ^J^■K{k) and 6 = 6q, we get, on ilx,fc, 

^kk < (1 + ((1 + 06)^l,k + 8(4 + e^')7T{k)) . 

For all {m',p') G Al^'^^, let us introduce 

V(m',p')(^6) = ^ E E + ^6)4,fc + 8(4 + e^^)7,{k)) . 

Kem' fceA{p'^) 

We have just proved that the set = f^KeVi, f^keAir^,) ^K,k is an event with probability 

P« (n^) > 1 - 2'^Ci95, d)/{r? log(n)) (39) 

over which 

< (1 + ^5)V\a,p)(^6). (40) 

Let us now fix € and fc € A(rj,). According to Bernstein's Inequality and Inequal- 
ity dSH), there exist events and each with P^-measure > 1 — 2 exp(— 3nA(^(-fC)), 
such that on fil 



K,k 



' 7r(jb) 
Ad(K) 



and on 



- Vci>K,fc(y,)-E, ['3>i^,fc(yi)] 



1=1 



< W6E,, 



< OjEs i^l^Yi)] + (1 + 30f i)^(fc), 



n 



^<i>2,_;,(y,)-E, [$?,,fc(yi)] 



i=l 



< ^mK,k\\io^s [^l,kiYi)\ Xd{k) + \\^K,k\\loUk) 

We thus obtain that Qm = f^KeV^ f^keA{r^,) {^k k ^ ^i: fc) event with probablity 

P.(f^M) > 1 - 4 X 2V(n' log(n)) (41) 

over which 

Ml,, < + ^7(1 - 08)"'lA(r-.)|M2,. + (^7(1 - ^8)"'(1 + 3^8^^) + (1 + 36^^)) ^(n)|A(r, 
M^. < Ml,, + e7|A(r-.)|M2,. + (1 + 3^7 i)7r(r,)|A(r,)! 
M2,, < (1 - 08)"^M2,^ + (1 + 308"^)(1 - e8)"V(r,) 

M2,, < (1 + 08)M2,. + (1 + 308-i)7r(n). (42) 
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Let us set il, = n Oo- H ^Im and 

Co = 1-2^1(1 + 02) 
Ci = 1 + 2^1(1 + 0^1) 

C2 = (l + Co)(l + 03)(l + ^4)(l + e5) 

Cs = 4(1 + Co)(4/3 + 9^'){1 + 04'i)(l - 9s)-' + ^7^7(1 - ^s)"' 

C4 = 3C3 + (20/3)(l + Co)(l + 303-i)(l + 9^') + C7 (1 + 30f 1 + 07(1 + 308"i)(l - ^g)"^) 
C5 = 2(2/3 + 0r') 

Ce = C7 + 4(1 + Co) (4/3 + 9^'){1 + 6^') 
C7 = (20/3)(l + Co)(l + 303-i)(l + 04 1). 



We choose pen such that, on il, and for all {m',p') G A4^^^, 



pen(m', p') = C2V(^m' ,p')i06) + + CMr.)) \Hr,)\ + C5M1 



n 

Thus, combining Inequalities ([32]), ([Ml), ([Ml), (HQ]), (|l2|) with the upper-bounds 

Mi,^ < ^(rO|A(rO|||s||oo and Ms,* < ^(rO||s||oo, 
we obtain, on r2m(C) l~l ^x(C) 1^ 

Colls - SII2 < Ci||s - S(m,p)||i + pen(m,p) + (C6||s||oo + C7) 7r(r*)|A(r*)|-. 

Setting 

C^ = C3(l + 08)+C5(l + 07) 

CI = C3(1 + 308-1) + C5(l + 30fi) 
we deduce from (j42p that, on $7,, 



pen(m,p) < C2F(™,p)(06) + (C^PIU + C^) 7r(r*)|A(r*)|- 



n 



so that, on Om(C) l~l ^x(C)) 



Colls - Sllaln. < Ci||s - S(^^p)||2 + C2V(m,p){^&) 

+ (C^||s|U + Cl)7r(r,)IA(r.)I^^^ 



+ (C6||s||oo + C7)4n)|A(r,)|^. 

n 
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Last, we recall that Fubini's Theorem yields, for all random variable C/, 

/"OO /"OO 

nu] < nu+] = / m+ > odc = / nu> odc 

■70 Jo 

and we underline that 

dim(S'(„^p)) 



y{m,p){06) < (1 + W [\\Sim,p) - 'HmM + + ^e^Mr,)- 



n 

Therefore, 

CqE, [\\s - sgln,] < Ci||s-S(„,p)||^ + (1 + 6'6)C2E4P(„,p) -s- ■ 



{m,/9) *(m,p)ll2j 

n 

+ (C^P||oo + C^)vr(r.)|A(r.)|^ 

+ 2 (Cellslloo + Cr) 7r(r.)|A(r,)|^. (43) 



There remains to bound the risk of s on 17^. According to (|37p . (|39p and (j4ip . 
P. := F.(f^^) < F,(l^^) + ¥s{ni) + P,(J7^,) < C{95, d)/{r? log(n)). 
From Pythagoras' Equahty and the inclusion of S(^fn,p) iiito 5'(m*,r*)i we deduce 

\\s - s\\l = \\s - S(m,p)ll2 < ||s||2 +X^("i*,r-^). 

Therefore, it follows from Cauchy-Scwharz Inequality that 

[\\s - sgtn.] < p.\\s\\l + ^p.Es [x4(m„r^)]. 

Let be some countable and dense subset of {t G »S'(jm.,'r^) s.t. ||t||2 — !}■ Since ^(w,-*^, 7"^^) — 
supjg^^ we deduce from Theorem 12 in |BBLM05] that 

where M is any upper-bound for sup^g^^^ maxi<j<„ 1^(1^) — {t, s)\ and cr^, any upper-bound 
for nsup^g^ Vars(t(Y'i)). Therefore, we obtain 



\ iog(raj n niog{n) J 



vr(r-.)|A(r,)|P| 
log(n) 



hence 

nlog ' (n) 
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Since ||s||oo > 1, we conclude thanks to (P3j) and 

[\\s - sg] < C'{\\s - + C!{E, [Pkp) - + C'Mr.)^^^ 

+ ||.|U.(r.)|A(r.)| f + C'fi + C'^ ] (45) 



where 



C(' = C7i/Co, C^' = (l + 06)C2/Co, C^ = 8(4 + OC2/Co, 
C^ = (C^ + C^)/Co, C^ = 2(C6 + C7)/Co, C'^ = C{e^,d). 

Choosing, for all (m, p) € A4^*^^, W{jn^p) = -Z>{m,/9) ["^li ^-^d taking in (|45p the minimum over 
(m, p) € Ai'^^^ allows to complete the proof. 



6 Proof of Theorem |4] 

Let us fix cr,p,p' , R, L satisfying the assumptions of the theorem and s G V{cr,p,p' ^R,L). 
For J = J-^, all the partitions given by Theorem [2] belong to A4-^, so according to Proposi- 
tion [5] and Theorem [2] applied with r = [crj + 1, 

[\\s-sg] 

< CiK",d,a,p,L) ( inf /i?22-2fcH{,.) ^^(^^)2^K ^22-2J.d(H(.)/d-{i/p-i/2)+WH(<.)A _ 

[ n } J 

In order to minimize approximately the above infimum, we choose 

h = max{/fc G N s.t. w{r^)2'"^/n < R^2~'^kH{<r)^ 
which is well defined since B?"n/w{ri,) < 1, and thus obtain 

< C{K",d,a,p,L) (^(i?(n/t.(n))-^(-)/'^))''^^'^'''^''^Vi?22-2^*^(^(-)/'^-(i/^'-V2)+)-/^^(-)^ . 

Given the assumptions on and R, the leading term in the right-hand sand is the first 
one. We then conclude thanks to Propostion U] 
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